16 research outputs found

    Efficient Iterative Processing in the SciDB Parallel Array Engine

    Full text link
    Many scientific data-intensive applications perform iterative computations on array data. There exist multiple engines specialized for array processing. These engines efficiently support various types of operations, but none includes native support for iterative processing. In this paper, we develop a model for iterative array computations and a series of optimizations. We evaluate the benefits of an optimized, native support for iterative array processing on the SciDB engine and real workloads from the astronomy domain

    Angiotensin-converting enzyme genotype and late respiratory complications of mustard gas exposure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Exposure to mustard gas frequently results in long-term respiratory complications. However the factors which drive the development and progression of these complications remain unclear. The Renin Angiotensin System (RAS) has been implicated in lung inflammatory and fibrotic responses. Genetic variation within the gene coding for the Angiotensin Converting Enzyme (ACE), specifically the Insertion/Deletion polymorphism (I/D), is associated with variable levels of ACE and with the severity of several acute and chronic respiratory diseases. We hypothesized that the ACE genotype might influence the severity of late respiratory complications of mustard gas exposure.</p> <p>Methods</p> <p>208 Kurdish patients who had suffered high exposure to mustard gas, as defined by cutaneous lesions at initial assessment, in Sardasht, Iran on June 29 1987, underwent clinical examination, spirometric evaluation and ACE Insertion/Deletion genotyping in September 2005.</p> <p>Results</p> <p>ACE genotype was determined in 207 subjects. As a continuous variable, FEV<sub>1 </sub>% predicted tended to be higher in association with the D allele 68.03 ± 20.5%, 69.4 ± 21.4% and 74.8 ± 20.1% for II, ID and DD genotypes respectively. Median FEV<sub>1 </sub>% predicted was 73 and this was taken as a cut off between groups defined as having better or worse lung function. The ACE DD genotype was overrepresented in the better spirometry group (Chi<sup>2 </sup>4.9 p = 0.03). Increasing age at the time of exposure was associated with reduced FEV<sub>1 </sub>%predicted (p = 0.001), whereas gender was not (p = 0.43).</p> <p>Conclusion</p> <p>The ACE D allele is associated with higher FEV<sub>1 </sub>% predicted when assessed 18 years after high exposure to mustard gas.</p

    Spatial, temporal, and demographic patterns in prevalence of smoking tobacco use and attributable disease burden in 204 countries and territories, 1990-2019 : a systematic analysis from the Global Burden of Disease Study 2019

    Get PDF
    Background Ending the global tobacco epidemic is a defining challenge in global health. Timely and comprehensive estimates of the prevalence of smoking tobacco use and attributable disease burden are needed to guide tobacco control efforts nationally and globally. Methods We estimated the prevalence of smoking tobacco use and attributable disease burden for 204 countries and territories, by age and sex, from 1990 to 2019 as part of the Global Burden of Diseases, Injuries, and Risk Factors Study. We modelled multiple smoking-related indicators from 3625 nationally representative surveys. We completed systematic reviews and did Bayesian meta-regressions for 36 causally linked health outcomes to estimate non-linear dose-response risk curves for current and former smokers. We used a direct estimation approach to estimate attributable burden, providing more comprehensive estimates of the health effects of smoking than previously available. Findings Globally in 2019, 1.14 billion (95% uncertainty interval 1.13-1.16) individuals were current smokers, who consumed 7.41 trillion (7.11-7.74) cigarette-equivalents of tobacco in 2019. Although prevalence of smoking had decreased significantly since 1990 among both males (27.5% [26. 5-28.5] reduction) and females (37.7% [35.4-39.9] reduction) aged 15 years and older, population growth has led to a significant increase in the total number of smokers from 0.99 billion (0.98-1.00) in 1990. Globally in 2019, smoking tobacco use accounted for 7.69 million (7.16-8.20) deaths and 200 million (185-214) disability-adjusted life-years, and was the leading risk factor for death among males (20.2% [19.3-21.1] of male deaths). 6.68 million [86.9%] of 7.69 million deaths attributable to smoking tobacco use were among current smokers. Interpretation In the absence of intervention, the annual toll of 7.69 million deaths and 200 million disability-adjusted life-years attributable to smoking will increase over the coming decades. Substantial progress in reducing the prevalence of smoking tobacco use has been observed in countries from all regions and at all stages of development, but a large implementation gap remains for tobacco control. Countries have a dear and urgent opportunity to pass strong, evidence-based policies to accelerate reductions in the prevalence of smoking and reap massive health benefits for their citizens. Copyright (C) 2021 The Author(s). Published by Elsevier Ltd.Peer reviewe

    Multi-versioned Data Storage and Iterative Processing in a Parallel Array Database Engine

    No full text
    Thesis (Ph.D.)--University of Washington, 2014Scientists today are able to generate data at an unprecedented scale and rate. For example the Sloan Digital Sky Survey (SDSS) generates 200GB of data containing millions of objects on each night on its routine operation. The large hadron collider is producing even more data today which is approximately 30PB annually. The Large Synoptic Survey Telescope (LSST) also will be producing approximately 30TB of data per night in a few years. Also, in many fields of science, multidimensional arrays rather than flat tables are standard data types because data values are associated with coordinates in space and time. For example, images in astronomy are 2D arrays of pixel intensities. Climate and ocean models use arrays or meshes to describe 3D regions of the atmosphere and oceans. As a result, scientists need powerful tools to help them manage massive arrays. This thesis focuses on various challenges in building parallel array data management systems that facilitate massive-scale data analytics over arrays. The first challenge with building an array data processing system is simply how to store arrays on disk. The key question is how to partition arrays into smaller fragments called chunks that form the unit of IO, processing, and data distribution across machines in a cluster. We explore this question in ArrayStore, a new read-only storage manager for parallel array processing. In ArrayStore, we study the impact of different chunking strategies on query processing performance for a wide range of operations, including binary operators and user-defined functions. ArrayStore also proposes two new techniques that enable operators to access data from adjacent array fragments during parallel processing. The second challenge that we explore in building array systems is the ability to create, archive, and explore different versions of the array data. We address this question in TimeArr, a new append-only storage manager for an array database. Its key contribution is to efficiently store and retrieve versions of an entire array or some sub-array. To achieve high performance, TimeArr relies on several techniques including virtual tiles, bitmask compression of changes, variable-length delta representations, and skip links. The third challenge that we tackle in building parallel array engines is how to provide efficient iterative computation on multi-dimensional scientific arrays. We present the design, implementation, and evaluation of ArrayLoop, an extension of SciDB with native support for array iterations. In the context of ArrayLoop, we develop a model for iterative processing in a parallel array engine. We then present three optimizations to improve the performance of these types of computations: incremental processing, mini-iteration overlap processing, and multi-resolution processing. Finally, as motivation for our work and also to help push our technology back into the hands of science users, we have built the AscotDB system. AscotDB is a new, extensible data analysis system for the interactive analysis of data from astronomical surveys. AscotDB provides a compelling and powerful environment for the exploration, analysis, visualization, and sharing of large array datasets

    Hybrid merge/overlap execution technique for parallel array processing

    No full text
    Whether in business or science, multi-dimensional arrays are a common abstraction in data analytics and many systems exist for efficiently processing arrays. As dataset grow in size, it is becoming increasingly important to process these arrays in parallel. In this paper, we discuss different types of array operations and review how they can be processed in parallel using two different existing techniques. The first technique, which we call merge, consists in partitioning an array, processing the partitions in parallel, then merging the results to reconcile computations that span partition boundaries. The second technique, which we call overlap, consists in partitioning an array into subarrays that overlap by a given number of cells along each dimension. Thanks to this overlap, the array partitions can be processed in parallel without any merge phase. We discuss when each technique can be applied to an array operation. We show that even for a single array operation, a different approach may yield the best performance for different regions of an array. Following this observation, we introduce a new parallel array processing technique that combines the merge and overlap approaches. Our technique enables a parallel array processing system to mix-and-match the merge and overlap techniques within a single operation on an array. Through experiments on real, scientific data, we show that this hybrid approach outperforms the other two techniques

    Assessing the potential of open educational resources for open and distance learning

    Get PDF
    In many emerging applications, data streams are monitored in a network environment. Due to limited communication bandwidth and other resource constraints, a critical and practical demand is to online compress data streams continuously with quality guarantee. Although many data compression and digital signal processing methods have been developed to reduce data volume, their super-linear time and more-than-constant space complexity prevents them from being applied directly on data streams, particularly over resource-constrained sensor networks. In this paper, we tackle the problem of online quality guaranteed compression of data streams using fast linear approximation (i.e., using line segments to approximate a time series). Technically, we address two versions of the problem which explore quality guarantees in different forms. We develop online algorithms with linear time complexity and constant cost in space. Our algorithms are optimal in the sense they generate the minimum number of segments that approximate a time series with the required quality guarantee. To meet the resource constraints in sensor networks, we also develop a fast algorithm which creates connecting segments with very simple computation. The low cost nature of our methods leads to a unique edge on the applications of massive and fast streaming environment, low bandwidth networks, and heavily constrained nodes in computational power. We implement and evaluate our methods in the application of an acoustic wireless sensor network

    Deep Learning in Archaeological Remote Sensing: Automated Qanat Detection in the Kurdistan Region of Iraq

    No full text
    In this paper, we report the results of our work on automated detection of qanat shafts on the Cold War-era CORONA Satellite Imagery. The increasing quantity of air and space-borne imagery available to archaeologists and the advances in computational science have created an emerging interest in automated archaeological detection. Traditional pattern recognition methods proved to have limited applicability for archaeological prospection, for a variety of reasons, including a high rate of false positives. Since 2012, however, a breakthrough has been made in the field of image recognition through deep learning. We have tested the application of deep convolutional neural networks (CNNs) for automated remote sensing detection of archaeological features. Our case study is the qanat systems of the Erbil Plain in the Kurdistan Region of Iraq. The signature of the underground qanat systems on the remote sensing data are the semi-circular openings of their vertical shafts. We choose to focus on qanat shafts because they are promising targets for pattern recognition and because the richness and the extent of the qanat landscapes cannot be properly captured across vast territories without automated techniques. Our project is the first effort to use automated techniques on historic satellite imagery that takes advantage of neither the spectral imagery resolution nor very high (sub-meter) spatial resolution

    Longitudinal Study of a Building-Scale RFID Ecosystem

    No full text
    Radio Frequency IDentification (RFID) deployments are becoming increasingly popular in both industrial and consumer-oriented settings. To effectively exploit and operate such deployments, important challenges must be addressed, from managing RFID data streams to handling limitations in reader accuracy and coverage. Furthermore, deployments that support pervasive computing raise additional issues related to user acceptance and system utility. To better understand these challenges, we conducted a four-week study of a building-scale EPC Class-1 Generation-2 RFID deployment, the “RFID Ecosystem”, with 47 readers (160 antennas) installed throughout an 8,000 square meter building. During the study, 67 participants having over 300 tags accessed the collected RFID data through applications including an object finder and a friend tracker and several tools for managing personal data. We found that our RFID deployment produces a very manageable amount of data overall, but with orders of magnitude difference among various participants and objects. We also find that the tag detection rates tend to be low with high variance across the type of tag, participant and object. Users need expert guidance to effectively mount their tags and are encouraged by compelling applications to wear tags more frequently. Finally, probabilistic modeling and inference techniques promise to enable more complex applications by smoothing over gaps and errors in the data, but must be applied with care as they add significant computational and storage overhead
    corecore